Introduction

America is a highly diverse country. It is not only diverse in terms of ethnicity, but also in terms of income, industry, and law. This opens the doors for a variety of possible interactions between these variables. What factors drive the way that income is distributed in the United States? What factors reliably predict whether the average income per capita in a specific area is high or low? How does state level variations in law and freedom impact income?

How are these questions SMART?

These questions are important because they tell many facets of the story of consumption in the United States. Income serves both as a measure of productivity and lifetime consumption (although this analysis does not disentangle the two). Although their scope is broad, they remain specific to the concepts of income, demographics, and freedom, and maintain a consistent structure: how do demographics and freedom drive income in the United States, at the census tract level.

These questions also correspond to a set of highly measurable (And luckily, premeasured) variables. Income can be imputed from tax records, while ethnicity and work status are available from census forms. The Freedom variables are somewhat more abstract, but come from an analysis from the Cato Institute (2015) which rigorously measures and weights different legal stances that each state might take. Achieving the answers to the questions is made simple by the cleanliness and availability of this data; since few data points are missing across all census tracts throughout the 50 states of interest, it is simple to form statistical tests.

Finally, these questions are relevant to policy makers who want to improve the incomes of their constituents as well as to researchers interested in establishing a baseline for the average income they should expect a community would earn based on its demographics. These are critical questions, because the ability of communities to support themselves economically has massive impacts on the wellbeing of their members.

Content

First, an examination is conducted on how the US Census Bureau database is structured, and which variables were included. Secondly, an analysis of Cato Institute’s Freedom in the 50 States shows whether or not it worked as a potential meaningful independent variable. Later on, the groups of independent variables and how each of them could affect the income per capita of a community is presented. Then, an exploratory data analysis and some statistical tests are made to evaluate the significance of our variables and a first preview of a regression analysis. Finally, a conclusion looks into further challenges and questions necessary to enhance future analyses.

Dataset

U.S. Census Bureau Dataset

The U.S. Census Bureau Data holds the yearly American Community Survey: a project which asks Americans around the country about several dimensions of their lives, including work, income, demographics, and other activities (U.S. Census Bureau, 2019). The dataset from 2015 was available via Kaggle (MuonNeutrino, 2015), and included more than 74,000 observations, with 37 columns (variables). The dataset includes two variables related to income: the median household income and income per capita. The variable income per capita was prefered because it adjusts per person, and not per household given that it’s unknown how many people can live in an average household. The variable income per capita (IncomePerCap) is calculated as the average income per capita of the population of a specific census tract. But, what is a census tract and why use them?

Census tracts

Household’s income in America varies significantly by geographical location. The richest counties in the country are concentrated in urban areas near big metropolises where most businesses are located. The bay area in northern California, Northeast Virginia and New York are some examples. However, counties have been an insufficient unit to compare different variables among them. There are 3,142 counties in a country of 300 million inhabitants (U.S. Census Bureau, 2019), but among them are several inconsistencies. Texas, for example, has 254 counties (U.S. Census Bureau, 2017). California, a state with approximately 10 million people more than Texas, has only 58 counties (U.S. Census Bureau, 2017). Population-wise California has the largest county in the country with more than 10 million inhabitants (Los Angeles), whereas Texas has more than 80 counties with less than 10,000 people (U.S. Census Bureau, 2017). Density-wise, New York has 4 of 5 of the most dense counties in the country, some of them 60,000 times more dense than counties in Hawaii, Alaska or Nevada (U.S. Census Bureau, 2013). As a response to these inconsistencies found in counties in America, the U.S. Census Bureau delineated “Census Tracts” at the beginning of the twentieth century. A census tract is “geographic region defined for the purpose of taking a census.” Over the years, the U.S. Census Bureau has established census tracts in every county in America. There are over 74,000 census tracts in the country and a typical one has around 4,000 or so residents. There is a strength that comes from this consistency: census tracts are by and large similar in population size, and the population size of census tracts does not vary much from state to state.

Freedom in the 50 States

The Freedom in the 50 States project presents a ranking of the American states based on how their policies promote freedom in the fiscal, regulatory, and personal realms. It was published by the libertarian-leaning think tank Cato Institute in 2015 and receives biannual updates. The index gathers data on more than 230 variables to measure “local government intervention across a wide range of policy categories—from taxation to debt, from eminent domain laws to occupational licensing, and from drug policy to educational choice.” Before relying on this index, it is important to delve into the author’s preferred definition of “freedom”. Fortunately, the methodology of the index is very transparent about their criteria. The chapter dedicated to “defining freedom” in the report states:

“We ground our conception of freedom on an individual rights framework. In our view, individuals should not be prevented from ordering their lives, liberties, and property as they see fit, so long as they do not infringe on the rights of others. […] This index attempts to measure the extent to which state and local public policies conform to this ideal regime of maximum, equal freedom. For us, the fundamental problem with state intervention in consensual acts is that it violates people’s rights.”

In other words, the report by the Cato Institute gives a higher score to those states whose laws protect individual rights, free markets, limited and small government and will punish those states with more regulations on trade and businesses, and restrictions on individual liberties. According to this report, some examples that are included in the study on how restrictions on freedom, in an economic, personal, regulatory or fiscal level, would be: - Aspiring professionals wanting to ply a trade without paying onerous examination and education costs. - Less-skilled workers priced out of the market by minimum wage laws. - Arrests for non drug victimless crimes, % of population - Prohibition of same-sex partnership

Why use this dataset as potential variables that affect income per capita?

It is valuable to determine how different combinations of public policies in these areas can affect income per capita. This allows to roughly establish if higher or lower levels of fiscal, regulatory or personal freedom (as defined by the Cato’s formulas) have any relationship with the income per capita level in a community.

Merging datasets

The Freedom in the 50 States dataset, though, was calculated at the state, rather than census tract, level. Therefore, when merging, each census tract has freedom scores identical to the score at the overarching state level. So, for example, since the fiscal policy score for Alabama was 0.122, then all census tracts located in Alabama have a “fiscal policy score” of 0.122. Unfortunately, this project does not allow the inspection of differences in policies below the state level (so policies enacted by municipalities are ignored).

Description of Variables

The complete dataset includes 17 independent variables and 1 dependent variable. Thanks to their nature, the independent variables were classified in three groups: Freedom Scores, Work Variation and Ethnic Variation.

Freedom Scores

Overall Freedom:

The weighted sum of all the variables is used to produce the overall freedom ranking of the states. The overall freedom scores rate states on how free they are relative to other states. A score of 1 would correspond to a state’s being one standard deviation above average in every single variable, although in reality, every state scores better on some variables and worse on others. A score of 0 would be equivalent to a state’s being absolutely average on every variable, and a score of ¬1 to a state’s being one standard deviation below average on every variable (Cato Institute, 2018).

Economic Freedom:

Economic freedom is calculated as the sum of the fiscal and regulatory freedom indices (Cato Institute, 2018).

Personal Freedom:

The personal freedom consists of the following categories: (a) incarceration and arrests for victimless crimes, (b) gun rights, (c) gambling freedom, (d) marriage freedom, (e) educational freedom, (f) alcohol freedom, (g) asset forfeiture, (h) marijuana freedom, (i) tobacco freedom, (j) travel freedom, (k) campaign finance freedom, and (l) other mala prohibita and miscellaneous civil liberties (Cato Institute, 2018).

Regulatory Policy:

The regulatory policy dimension includes categories for land-use freedom and environmental policy, health insurance freedom, labor-market freedom, lawsuit freedom, occupational freedom, miscellaneous regulations that do not fit under another category (such as certificate of need requirements), and cable and telecommunications freedom (Cato Institute, 2018).

Fiscal Policy:

The fiscal policy dimension consists of six variables: (a) state tax revenues, (b) government consumption, (c) local tax revenues, (d) government employment, (e) government debt, and (f) cash and security assets (Cato Institute, 2018).

Work Variation:

Professional:

Percentage (%) employed in management, business, science, and arts in a census tract.

Service:

Percentage (%) employed in service jobs in a census tract.

Office:

Percentage (%) employed in sales and office jobs in a census tract.

Construction:

Percentage (%) employed in natural resources, construction, and maintenance in a census tract.

Production:

Percentage (%) employed in production, transportation, and material movement in a census tract.

Unemployed:

Unemployment rate (%) in a census tract.

Self-employed:

Percentage (%) self-employed in a census tract.

Ethnic Variation

Native:

Percentage (%) of population that is Native American or Native Alaskan in a census tract.

White:

Percentage (%) of population that is white in a census tract.

Black:

Percentage (%) of population that is black in a census tract.

Hispanic:

Percentage (%) of population that is Hispanic/Latino in a census tract.

Asian:

Percentage (%) of population that is Asian in a census tract.

Population Histogram and QQ

A baseline analysis of population and income was conducted. The histogram for population appeared skewed to the right. The different census tracts had similar population counts with a mean of about 4000. Counties were not evenly spread out as some had a population of 1 million and others 10 million. With similar populations, census tracts were easier to investigate instead of counties. The Q-Q plot confirmed the non-normality as the values between quartiles 3 and 4 were far away from the line.

Income Histogram and QQ

## [1] 3589
## [1] 0
## [1] 69672
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     128   18776   24730   26140   32247   56040
## [1] 10274.98

The raw data for income appeared very skewed to the right as well. The data appeared to follow a power-law curve as some individuals have amassed a large amount of income and these outliers can skew the data. Thus, the outliers and NA values were removed; checking again, the “cleaned data” appeared normal. The histogram appears monomodal and the error terms along the Q-Q plot did not stray away from the line.

Individual EDA of Freedom scores

## Observations per group: 17391, 17605, 21042, 12633. 1001 missing.
##  Factor w/ 4 levels "[-0.827,-0.197]",..: 3 3 3 3 3 3 3 3 3 3 ...

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
## -0.8272 -0.1968  0.0176 -0.0811  0.1376  0.3550    1001

## Observations per group: 17296, 17735, 19513, 14127. 1001 missing.
##  Factor w/ 4 levels "[-0.0444,0.0135]",..: 1 1 1 1 1 1 1 1 1 1 ...

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
## -0.0444  0.0135  0.0803  0.0642  0.1064  0.2450    1001

## Observations per group: 17603, 16971, 17167, 16930. 1001 missing.
##  Factor w/ 4 levels "[-0.457,-0.223]",..: 3 3 3 3 3 3 3 3 3 3 ...

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
## -0.4569 -0.2228 -0.0737 -0.1470 -0.0322  0.0715    1001

## Observations per group: 17952, 18087, 15977, 16655. 1001 missing.
##  Factor w/ 4 levels "[-0.37,-0.0202]",..: 3 3 3 3 3 3 3 3 3 3 ...

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
## -0.3702 -0.0202  0.0634  0.0660  0.1767  0.4024    1001

## Observations per group: 17412, 18197, 16341, 16721. 1001 missing.
##  Factor w/ 4 levels "[-0.814,-0.0878]",..: 2 2 2 2 2 2 2 2 2 2 ...

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
## -0.8136 -0.0878  0.0794 -0.0169  0.1637  0.4614    1001

Next, the seventeen independent variables were analyzed. The freedom scores were economic freedom, personal freedom, regulatory policy, fiscal policy, and overall freedom. The box plots were split up into four evenly distributed quartiles by the income per capita in each quartile. For all the five sets of boxplots, there did not appear to be any differences between the quartiles as they all overlapped roughly the same range of their respective independent variables. The histograms did not appear normal as overall the data was randomly spread out with huge gaps between bins. The Q-Q plots told a similar story as the error terms tended to follow a sin-like trend over the line and there were big tails on either end. None of the freedom scores appeared to be distributed normally. ## Individual EDA of Work Variations

## Observations per group: 17922, 17175, 17218, 17256. 101 missing.
##  Factor w/ 4 levels "[0,5.3]","(5.3,7.9]",..: 2 4 2 3 1 3 3 3 3 2 ...

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   0.000   5.300   7.900   9.251  11.600 100.000     101

## Observations per group: 17557, 17275, 17411, 17324. 105 missing.
##  Factor w/ 4 levels "[0,23.7]","(23.7,31.7]",..: 3 1 2 2 4 2 1 4 2 2 ...

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##    0.00   23.70   31.70   33.23   41.80  100.00     105

## Observations per group: 17476, 17574, 17443, 17074. 105 missing.
##  Factor w/ 4 levels "[0,20.3]","(20.3,23.9]",..: 2 2 2 3 1 4 3 4 3 1 ...

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##    0.00   20.30   23.90   24.12   27.70  100.00     105

## Observations per group: 17771, 17028, 17599, 17169. 105 missing.
##  Factor w/ 4 levels "[0,14.1]","(14.1,18.3]",..: 2 4 4 3 2 2 4 1 2 1 ...

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##    0.00   14.10   18.30   19.65   24.00  100.00     105

## Observations per group: 17544, 17264, 17575, 17184. 105 missing.
##  Factor w/ 4 levels "[0,5.4]","(5.4,8.7]",..: 3 3 3 2 1 2 3 2 2 3 ...

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   0.000   5.400   8.700   9.636  12.800 100.000     105

## Observations per group: 17456, 17586, 17245, 17280. 105 missing.
##  Factor w/ 4 levels "[0,7.7]","(7.7,12.3]",..: 3 4 3 3 3 3 3 1 3 4 ...

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##    0.00    7.70   12.30   13.36   17.80  100.00     105

## Observations per group: 17940, 17233, 17140, 17254. 105 missing.
##  Factor w/ 4 levels "[0,3.5]","(3.5,5.4]",..: 2 3 4 1 2 3 1 3 2 3 ...

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   0.000   3.500   5.400   6.109   7.900 100.000     105

Next the seven variables for work variations (professional, production, unemployment, office, service, construction, self-employed) were assessed for normality. The boxplots that exhibited a decrease in income, as more of the specific work variation was included in the census tract, were unemployment, service, construction, and production. That is to say, as more unemployed individuals were accounted for in a given census tract, the income per capita decreased. The only work variation that exhibited an increase in average income was professional work. The remaining variables of office and self-employed remained relatively stable across quartiles. Looking at the histograms of each of the variables it appeared that only the proportion of professionals was distributed normally. The remaining six work variations were all skewed to the right. For professionals, the Q-Q plots affirmed the normality as the plot did not have the error terms straying far from the line with very small right and left tails. The same cannot be said for the other variables as each had an oversized right tail and a relatively small left tail. Overall the proportion of professionals appeared normally distributed while the other work variations did not.
## Individual EDA of ethnicities

## Observations per group: 18357, 16720, 17177, 17418. 0 missing.
##  Factor w/ 4 levels "[0,0.8]","(0.8,4]",..: 3 4 4 2 4 3 4 3 3 3 ...

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.00    0.80    4.00   13.78   15.32  100.00

## Observations per group: 17726, 17191, 17343, 17412. 0 missing.
##  Factor w/ 4 levels "[0,2.4]","(2.4,7.2]",..: 1 1 1 3 1 3 2 1 1 1 ...

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.00    2.40    7.20   17.36   21.50  100.00

## Observations per group: 17518, 17416, 17489, 17249. 0 missing.
##  Factor w/ 4 levels "[0,0.1]","(0.1,1.2]",..: 2 3 3 1 3 1 1 1 1 2 ...

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   0.100   1.200   4.347   4.400  91.300

## Observations per group: 17437, 17434, 17487, 17314. 0 missing.
##  Factor w/ 4 levels "[0,37.1]","(37.1,70.3]",..: 3 2 3 3 2 3 3 3 4 3 ...

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.00   37.10   70.30   61.24   88.40  100.00

##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
##   0.0000   0.0000   0.0000   0.7567   0.4000 100.0000

Finally the five ethnic variables (Native, White, Black, Hispanic, and Asian) were investigated. The boxplots for White showed an increase in average income between the first second and third quartiles but no change in the fourth. The boxplot for Asian showed an increase from the first through the fourth quartile. The boxplots for Hispanic slightly increased between the first and second quartile but did not change for the third quartile. The fourth quantile for Hispanic decreased significantly. The boxplot for Black increased in average income between the first and second quartile. Then there was a decrease in average income from the second to the fourth quartiles. Overall, it appeared that average income did change based on concentration of ethnicities in a census tract. The histogram for White was bimodal with the highest frequency at over 8,000. The histograms for the other four ethnicities were skewed to the right. Based on the histogram, it appeared that white had the highest responses followed by Hispanic, Black, Asian, and Native. All of the error terms along the Q-Q plot line for each of the ethnicity variables followed a curve with large left and right tails. Also, there were not enough responses from the Native ethnicity to construct a meaningful boxplot. For the native Q-Q plot, there was a clear pattern of the error terms along the line implying non-normality. Therefore, based on the assessment of the boxplots, histograms, and Q-Q plots, none of the ethnicities appear normally distributed.

#Preprocessing KNN

## Observations per group: 34838, 34834. 0 missing.
##  Factor w/ 2 levels "[128,2.47e+04]",..: 2 1 1 1 2 2 1 2 1 1 ...

## 'data.frame':    69672 obs. of  12 variables:
##  $ TotalPop    : int  1948 2156 2968 4423 10763 3851 2761 3187 10915 5668 ...
##  $ Hispanic    : num  0.9 0.8 0 10.5 0.7 13.1 3.8 1.3 1.4 0.4 ...
##  $ White       : num  87.4 40.4 74.5 82.8 68.5 72.9 74.5 84 89.5 85.5 ...
##  $ Black       : num  7.7 53.3 18.6 3.7 24.8 11.9 19.7 10.7 8.4 12.1 ...
##  $ Asian       : num  0.6 2.3 1.4 0 3.8 0 0 0 0 0.3 ...
##  $ Professional: num  34.7 22.3 31.4 27 49.6 24.2 19.5 42.8 31.5 29.3 ...
##  $ Service     : num  17 24.7 24.9 20.8 14.2 17.5 29.6 10.7 17.5 13.7 ...
##  $ Office      : num  21.3 21.5 22.1 27 18.2 35.4 25.3 34.2 26.1 17.7 ...
##  $ Construction: num  11.9 9.4 9.2 8.7 2.1 7.9 10.1 5.5 7.8 11 ...
##  $ Production  : num  15.2 22 12.4 16.4 15.8 14.9 15.5 6.8 17.1 28.3 ...
##  $ Unemployment: num  5.4 13.3 6.2 10.8 4.2 10.9 11.4 8.2 8.7 7.2 ...
##  $ ipc         : Factor w/ 2 levels "[128,2.47e+04]",..: 2 1 1 1 2 2 1 2 1 1 ...
## [1] 626
## [1] 0
## [1] 12
## 'data.frame':    69567 obs. of  12 variables:
##  $ TotalPop    : int  1948 2156 2968 4423 10763 3851 2761 3187 10915 5668 ...
##  $ Hispanic    : num  0.9 0.8 0 10.5 0.7 13.1 3.8 1.3 1.4 0.4 ...
##  $ White       : num  87.4 40.4 74.5 82.8 68.5 72.9 74.5 84 89.5 85.5 ...
##  $ Black       : num  7.7 53.3 18.6 3.7 24.8 11.9 19.7 10.7 8.4 12.1 ...
##  $ Asian       : num  0.6 2.3 1.4 0 3.8 0 0 0 0 0.3 ...
##  $ Professional: num  34.7 22.3 31.4 27 49.6 24.2 19.5 42.8 31.5 29.3 ...
##  $ Service     : num  17 24.7 24.9 20.8 14.2 17.5 29.6 10.7 17.5 13.7 ...
##  $ Office      : num  21.3 21.5 22.1 27 18.2 35.4 25.3 34.2 26.1 17.7 ...
##  $ Construction: num  11.9 9.4 9.2 8.7 2.1 7.9 10.1 5.5 7.8 11 ...
##  $ Production  : num  15.2 22 12.4 16.4 15.8 14.9 15.5 6.8 17.1 28.3 ...
##  $ Unemployment: num  5.4 13.3 6.2 10.8 4.2 10.9 11.4 8.2 8.7 7.2 ...
##  $ ipc         : Factor w/ 2 levels "[128,2.47e+04]",..: 2 1 1 1 2 2 1 2 1 1 ...
##  - attr(*, "na.action")= 'omit' Named int  1484 1807 2299 2499 2789 4259 4444 4448 4449 4477 ...
##   ..- attr(*, "names")= chr  "1514" "1851" "2370" "2574" ...

KNN

Pairs

Train-Test split 3:1

KNN results

## 
## Attaching package: 'class'
## The following objects are masked from 'package:FNN':
## 
##     knn, knn.cv
##                     dat.testLabels
## dat_pred             [128,2.47e+04] (2.47e+04,5.6e+04]
##   (2.47e+04,5.6e+04]           2178               9533
##   [128,2.47e+04]               9240               1885
## [1] 22836
## [1] 2178 1885
## [1] 0.1779208
##       Accuracy          Kappa  AccuracyLower  AccuracyUpper   AccuracyNull 
##   8.220792e-01   6.441583e-01   8.170561e-01   8.270201e-01   5.000000e-01 
## AccuracyPValue  McnemarPValue 
##   0.000000e+00   4.627777e-06
##          Sensitivity          Specificity       Pos Pred Value 
##            0.8092486            0.8349098            0.8305618 
##       Neg Pred Value            Precision               Recall 
##            0.8140210            0.8305618            0.8092486 
##                   F1           Prevalence       Detection Rate 
##            0.8197667            0.5000000            0.4046243 
## Detection Prevalence    Balanced Accuracy 
##            0.4871694            0.8220792

Selecting the correct “k”

How does “k” affect classification accuracy? Let’s create a function to calculate classification accuracy based on the number of “k.”

##  num [1:2, 1:11] 1 0.795 3 0.822 5 ...

Conclusion

Overall, this analysis found that there are several ways in which our independent variables reliably predict income in communities across the United States. The Freedom variables we drew from the Cato Institute performed poorest, with a high internal correlation and little predictive power. Ethnicity and work type proportions had stronger predictive power, with the latter having the most powerful effects.  However, these variables suffer from being largely non-normal, with a rightward skew, and from having high internal correlations, both between and within the two categories. Altogether, these variables allow us to predict income per capita at the census tract level with high reliability (R-squared = .67); this is actually quite impressive given the simplicity of this data. For instance, it does not directly include any information about the age or education of the population.
Moving forward, this analysis allows for several expansions. The first is to integrate new data, such as age and education status of census tract residents. Additionally, it may be valuable to consider each of the individual freedom measures on its own, to negate the influence of high internal correlation. Finally, it is interesting if there are differences driven by geographic density, which can be estimated with just the currently accessible data.

Bibliography

Cato Institute. (2018) Freedom In the Fifty States. UpToDate. Retrieved March 23, 2020, from https://www.freedominthe50states.org/how-its-calculated

MuonNeutrino. (2015). US Census Demographic Data: Demographic and Economic Data for Tracts and Counties. UpToDate. Retrieved March 23, 2020, from https://www.kaggle.com/muonneutrino/us-census-demographic-dataD

U.S. Census Bureau (2019). “Annual Estimates of the Resident Population for the United States, Regions, States, and Puerto Rico: April 1, 2010 to July 1, 2019”. 2010-2019 Population Estimates. United States Census Bureau, Population Division. December 30, 2019. Retrieved January 27, 2020.

U.S. Census Bureau (2017). “American FactFinder - Results”. U.S. Census Bureau. Retrieved 2017-12-13.

U.S. Census Bureau (2013). “2010 Census Summary File 1: GEOGRAPHIC IDENTIFIERS”. American Factfinder. US Census. Retrieved 18 October 2013.